Online advertising has been introduced as one of the most efficient methodsof advertising throughout the recent years. Yet, advertisers are concernedabout the efficiency of their online advertising campaigns and consequently,would like to restrict their ad impressions to certain websites and/or certaingroups of audience. These restrictions, known as targeting criteria, limit thereachability for better performance. This trade-off between reachability andperformance illustrates a need for a forecasting system that can quicklypredict/estimate (with good accuracy) this trade-off. Designing such a systemis challenging due to (a) the huge amount of data to process, and, (b) the needfor fast and accurate estimates. In this paper, we propose a distributed faulttolerant system that can generate such estimates fast with good accuracy. Themain idea is to keep a small representative sample in memory across multiplemachines and formulate the forecasting problem as queries against the sample.The key challenge is to find the best strata across the past data, performmultivariate stratified sampling while ensuring fuzzy fall-back to cover thesmall minorities. Our results show a significant improvement over the uniformand simple stratified sampling strategies which are currently widely used inthe industry.
展开▼